Using LSTM and PSO techniques for predicting moisture content of poplar fibers by Impulse-cyclone Drying

Impulse-cyclone drying (ICD) is a new type of pretreatment method to remove the excess moisture of wood fibers (WFs) with high speed and low energy consumption. However, the process parameters are often determined by the experience of the process operators, thus the quality of WF drying lacks an objective basis and cannot be ensured. To address this issue, this study adopted the long short-term memory (LSTM) neural network, backpropagation neural network, and Central-Composite response surface method to establish a moisture content (MC) prediction model and a process parameter optimization model based on single-factor experiments. The initial MC, inlet air temperature, feed rate, and inlet air velocity were taken as the experimental factors, and the final MC was taken as the inspection index. The parameters of LSTM were optimized by particle swarm optimization (PSO) algorithm, and the predicted value of MC was fitted to the model. The PSO-optimized LSTM had higher prediction accuracy than did the typical prediction models. The optimal process for the targeted MC, which was obtained by PSO, was featured with an initial MC of 10.3%, inlet air temperature of 242°C, feed rate of 90 kg/h, and inlet air velocity of 8 m/s. PSO-LSTM could be a new approach for predicting the MC of WFs, which, in turn, could provide a theoretical basis for the application of ICD technology in the biomass composite industry.


Introduction
The moisture content (MC) in wood fibers (WFs) not only has an important influence on the strength and other mechanical properties of wood-plastic composites (WPCs) but also is an important factor affecting the deformation of WPCs [1]. The main chemical components of wood cell wall include hydroxyl groups with strong water absorption and other oxygen-containing groups that can form hydrogen bonds with water [2]. Because of this hygroscopicity, the properties of the composite processing and final products may be affected. WFs have particularly high MC, and thus they should be dried to obtain a certain MC before compounding [3]. In high-temperature composition, the moisture is rapidly vaporized from the fiber surface varies greatly. The parameters set by experience may lead to instable prediction results and reduce the prediction accuracy. In addition, neural network models are prone to gradient disappearance or gradient explosion [17][18][19][20]. For the establishment of a MC prediction model for WF drying, there is no relevant literature. As an improved recurrent neural network (RNN), the long short-term memory (LSTM) neural network can effectively learn the long-term dependence of temporal data, and thus it is widely used in many scientific and technological fields, such as machine reading, emotion analysis, and image description [21,22]. Moreover, the LSTM model is applied to the prediction of product performance under the interaction of multiple factors [23,24]. The prediction effect of the LSTM model depends on the reliability of input variables, but there is no research exploring the combination of WF drying process conditions and LSTM. PSO was first proposed by Eberhart and Kennedy in 1995 [25,26]. Its basic concept comes from the study of birds' foraging behavior. PSO algorithm is inspired by the behavior of this biological population and used to solve optimization problems [27]. Accordingly, PSO is used to optimize the LSTM to obtain the optimal parameters. Using the obtained hyperparameters to build the model and get the prediction results improves the prediction accuracy of the model greatly. However, there is little research on the prediction of the MC. In this study, we analyzed the advantages of ICD drying over conventional oven-drying and drum-drying. To determine the relationship between the process parameters and the final MC, we used the initial MC, inlet airflow temperature, air velocity, and feed rate as input parameters and the final MC after drying as output parameter for the PSO-LSTM model. The traditional linear regression analysis, BP neural network model, and LSTM model were compared, and the model was validated by ICD experiments, which provided a theoretical basis for ICD process innovation and intelligent control. The results of this study could provide an effective method of MC prediction for the theoretical research of heat and mass transfer during drying and provide a reliable guarantee for the quality and efficiency of drying under ICD.

Description of impulse-cyclone drying
The experiments were performed in an ICD system (MQG-50, Jianda Drying Equipment Co., Ltd., Changzhou, China), which is schematically presented in Fig 2. The ICD system was equipped with an electric heater, an induced draft fan, a screw feeder, an impulse dryer, a cyclone dryer, and a cyclone separator. The power absorbed by the induced draft fan could be modified within the range 0-66 000 W, and its speed could be adjusted from 1610 to 2844 m 3 �h -1 . The poplar fibers could be put into the screw feeder with the flow regulation mode of frequency conversion speed regulation within the range 0-1100 W. The impulse dryer comprised of three pulse pipes (186 cm in height and 30 cm in diameter) and three straight pipes (18 cm in diameter) made of 0.2 cm thick steel sheet.

Poplar fibers sample
Because poplar veneers are easier to obtain in a disintegrator with uniform aspect ratio and MC after milling, poplar veneers of about 500 kg (Zhonghan-17 fast-growing poplar, Harbin Yongxu Wood-Based Panel Co., Ltd., Heilongjiang, China) were selected as the fiber source. The size of each veneer was 1.2 mm × 40 mm × 40 mm, the air-dried density was 0.38 g/m 3 , and the average MC was 13.8 (± 1.2) %. To satisfy the experimental requirements, the veneers were crushed into 60-80 mesh fiber samples by a biomass fiber crusher (MF-600, Jiangsu Fuyang Machinery Co., Ltd., Xuzhou, China), as shown in Fig 3. The material was passed through the screening machine to obtain the required form of fiber. By changing the screen  corresponding to the screening machine, 60-80 mesh WF could be obtained [28]. Then, the screened WFs were randomly sampled, and the samples were taken three times with 100 mg each time. Fig 4 shows the measurement process of WF morphology. The sample was placed on the glass slide and then analyzed by high-definition digital microscopes (GE-5, Shanghai Changfa Optical Instrument Co., Ltd., Shanghai, China) with a magnification of 40 times. The arithmetic mean value of the three sampling measurement results was calculated as the measurement result, with the average length of WFs, average diameter, and aspect ratio being 1.53 (± 0.18) mm, 283 (± 35) μm, and 5.4 (± 1.61), respectively.

Drying experiments
In the experiment, the MC of WF was pretreated to obtain the initial MC. The high-pressure spray method was adopted to deal with the WF. The specific method was to put poplar fibers into the high-speed mixer and atomize them with high pressure sprayers at the feeding port to make the MC of poplar uniform. The fiber was extracted and sealed with a sealed bag and kept for 24 h, so that the MC was balanced. The WFs were first subjected to a drying environment generated using a heat generator, a screw feeder, and an induced draft fan under different temperatures (160-240˚C), inlet air velocities (9-13 m/s), and feed rates (90-150 kg/h), with the hot air being the drying medium; the fibers were dried by heat and mass exchange with the hot airflow.

Determination of moisture content
The average MC of poplar fiber was measured according to "Standard Test Methods for Direct Moisture Content Measurement of Wood and Wood-Based Materials" (ASTM D4442-2016). MC was calculated as follows: where W A is the original mass (g), and W B is the ovendry mass (g).

Different drying methods on MC uniformity and energy consumption
The fibers dried by different methods, such as ICD, oven-drying, and drum-drying, were randomly sampled, and 30 groups of samples were taken for MC determination. The weight of the fiber was 5 kg, the initial MC was 13±2%, and the fiber size was 60-80 mesh. The drying process conditions of ICD system were an inlet air temperature of 180˚C, a feed rate of 90 kg/ h, and an inlet air velocity of 9 m/s. The oven-drying conditions were a temperature of 103˚C

Multiple Linear Regression (MLR) model
According to the single-factor experimental study of ICD, combined with the actual operation results, the initial MC (A), inlet air temperature (B), feed rate (C), and inlet air velocity (D) were taken as the experimental factors, and the final MC was taken as the inspection index. To construct MLRs, the Central Composite Response Surface Methodology (RSM) design conditions were constructed using the statistical software package Design Expert 10.0.7, Stat-Ease Inc., MN (www.statease.com) [29]. Table 1 lists the ranges of each variable used. Each of the four independent variables had five levels for which the design expert software provided a combination of 30 experiments.

LSTM modeling
The LSTM is a kind of improved RNN, as it can solve the problem of RNN perception ability decline [30]. In contrast to RNN, the LSTM adds a cell state on its basis, and the LSTM unit controls the cell state through three gates: the forgetting gate, the input gate, and the output gate, as shown in Fig 6 [31]. The internal structure of the LSTM unit is constituted by a sigmoid neural network layer and a point multiplication operation). Whether to discard some information is decided by the sigmoid layer of the forgetting gate. The results of the operation are 1 for reserving and 0 for discarding information. The operation equation of the forgetting gate is where f t is the result of the forgetting gate; W f is the forgetting gate weight matrix; x t and h t−1 are the input of the current time and the output of LSTM at the previous time, respectively; b f is the bias term of the forgetting gate; and σ is the sigmoid activation function. By adding new information according to the sigmoid layer of the input gate and combining it with the candidate values obtained from the tanh layer, the state update amount is obtained, as shown in Eqs (3) and (4).
where i t is the result of input gate operation;C t is the candidate value; W i and b i are the input gate weight matrix and the bias term, respectively; and W c and b c are the weight matrix and the bias term of the element state, respectively. Considering the information discarded in the forgetting gate, the unit state at the current time can be acquired, as shown in Eq (5).
where C t is the cell state at the current time; C t−1 is the unit state of the previous time; f t �C t−1 shows the discarded information; and i t �C t refers to the state update quantity. The sigmoid layer of the output gate determines which information to output and then combines the candidate cell state processed by the tanh layer to obtain the output, as shown in Eqs (6) and (7).
where O t is the operation result of the output gate; h t is the output of the current time; and W o and b o are the weight matrix and the bias term of the output gate, respectively.

Optimizing LSTM prediction model by PSO
To make the prediction model better match the data characteristics of MC under different working conditions, PSO algorithm was used to optimize the LSTM model, and PSO-LSTM model was constructed to obtain better parameter combination. Firstly, the batch size and the number of hidden layer units were randomly initialized within a given range as the initial parameters of the LSTM model. The initial model and the trained model were trained and predicted, respectively, through the divided training data and verification data, and the average absolute percentage error of the prediction results was taken as the fitness function f. The fitness function f is defined as whereŷ m is the m-th tag value, and y m is the m-th predicted value. The number of iterations was 500, and the inertia constant was 0.7. When the number of particle iterations reached 500 or the fitness value reached the set requirements, the iteration was stopped.

Data preprocessing
In this stage, the neural network model was studied and constructed under the TensorFlow learning framework based on Python. The TensorFlow 2.0 library was loaded into Anaconda in advance, and then the NumPy, Pandas, and Matplotlib libraries in Python data analysis were imported. In the data preprocessing stage, the original data obtained by the RSM experiment design were simply processed, and the data set was divided into the training set and the test set. The training data for model training accounted for 70.4%, and the test data covered 29.6% to test the generalization error of the model.
To increase the speed and accuracy of the neural network gradient descent to find the optimal solution, the input data were normalized, and all the numerical information was gathered within the range 0-1 [32]. After the model construction, denormalization was carried out; the normalization equation is where x is the normalized input value, I i is the sample data value before normalization, I min is the minimum value in the sample data, and I max is the maximum value in the sample data.

Model evaluation index
In this study, the mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE), Pearson correlation coefficient, r, and determination coefficient, R 2 , between predicted data and real data were chosen as the evaluation indexes of model performance. MAE accurately reflects the error between the predicted value and the real value. The smaller the MAE is, the closer the predicted value is to the real value, and the more accurate the prediction is. MAE is expressed by Eq (10).
MSE shows the difference between the predicted value and the real value. The smaller the MSE value is, the smaller the difference between the predicted value and the real value is, and the better the accuracy of the model is. MSE is expressed by Eq (11).
MAPE indicates the percentage of relative error between the predicted value and the real value. The smaller the MAPE value is, the better the model is. MAPE is expressed by Eq (12).
Pearson correlation coefficient, r, refers to the linear correlation between the predicted value and the real value. The closer r is to 1, the better the correlation between the predicted value and the real value is. r is expressed by Eq (13).

r ¼
COVðY;Ỹ Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The determination coefficient, R 2 , displays the reliability of the model. The closer R 2 is to 1, the more reliable the model is. R 2 is expressed as Eq (14).
where n is the number of test data sets, y i is the true value of the i-th sample point,ỹ i is the predicted value of the i-th sample point, y i.ave is the average value of the sample real value, Y is the true value of the sample, andỸ is the predicted value of the model. Further, COVðY;Ỹ Þ indicates the covariance of Y andỸ ; VARðYÞ denotes the variance of Y, and VARðỸ Þ represents the variance ofỸ .

Model parameter setting
In this study, Python programming language with Tensorflow 2.0 framework was used to build the BP neural network (Fig 7). It can be noticed that a three-layer structure was adopted in the network, and there were four input layer nodes: the initial MC, inlet air temperature, feed rate, and inlet air velocity. Besides, there were 50 hidden layer nodes and one output node, which corresponds to the final MC obtained from the experiment. The sigmoid function was selected to activate the hidden layer, while the learning rate was set to 0.001 and the number of iterations to 1000. Furthermore, the adaptive motion estimation optimizer was adopted to update the weights of the neural network [33]. As shown in Fig 8, the framework of the multivariable LSTM prediction model can be divided into three parts: the input layer, the LSTM layer, and the output layer. The input of the input layer belonging to the LSTM model is I = (I 1 , I 2 , I 3 , I 4 ). After normalization and weighting, the input unit becomes the input x i of the LSTM layer. The number of neural units in the LSTM layer is 50. After processing, the output of the LSTM layer is H i , and the output result of the LSTM layer becomes the input of the output layer after weighted processing. Following the fully connected layer, the output data are denormalized to get the predicted value of the final MC, y. When the error between the actual output and the expected output exceeds the specified accuracy, it enters the error BP stage. The output layer corrects the weight of each layer by decreasing the error gradient, and the error propagates back to the LSTM layer and the input layer.

Single-factor results
The effects of different process parameters on the final MC of WF were studied. The feeding rate was fixed at 120 kg/h, the inlet air velocity was 11 m/s, and the inlet temperature was changed to 120˚C, 140˚C, 160˚C, 180˚C, 200˚C, 220˚C, and 240˚C. In the same way, the feeding rate was fixed at 120 kg/h, the inlet temperature was 200˚C, and the inlet air velocity was changed to 7 m/s, 8 m/s, 9 m/s, 10 m/s, 11 m/s, 12 m/s, and 13 m/s. Similarly, the inlet air velocity was fixed at 11 m/s, the inlet temperature was 200˚C, and the feed rate was changed to 60 kg/h, 75 kg/h, 90 kg/h, 105 kg/h, 120 kg/h, 135 kg/h, and 150 kg/h. The experiment was repeated three times, and each index was determined three times under the same conditions. Fig 9(A) shows how the final MC of poplar fiber changes with the inlet air temperature. Obviously, the higher the inlet air temperature of the dryer is, the faster the water molecules move, which is conducive to the vaporization of the fiber surface and increases the temperature and humidity gradient inside and outside the fibers. During the experiment, the drying temperature should be increased to obtain the lowest final MC. Fig 9(B) illustrates how the final MC of poplar fiber changes with the inlet air velocity. With an increase in the inlet air velocity, the final MC shows an upward trend and then a downward trend. Fig 9(C) demonstrates how the final MC of poplar fiber changes with the feed rate. The final MC shows an upward trend with an increase in the inlet air velocity, mainly because the total evaporation of fiber water increases and the air temperature decreases with an increase in feed rate. In this case, the heat and mass transfer force and the mass transfer rate decrease.

Operation and verification of ICD system
In this drying system, the anemometers were set in the impulse dryer to measure the straight pipe air velocity and impulse pipe air velocity, and the monitoring system recorded the results when it was stable. During the experiment, the temperature of the dryer was 160˚C, and the frequency of the induced draft fan was 0-50 Hz. When processing the data, the blank air velocity was subtracted when the induced draft fan did not start from the data result, and the arithmetic mean value of five measurements was taken to obtain the air velocity value. When adding materials, the results were recorded in the steady state. In Fig 10, it can be noticed that, under the same induced draft fan frequency, the air velocities of straight pipe and impulse pipe are positively correlated with the frequency of induced draft fan. In particular, the greater the frequency of induced draft fan is, the greater the air velocities of straight pipe and impulse pipe are. According to the experimental results, under the same induced draft fan frequency, the air velocity of the impulse pipe was lower than that of the straight pipe, which indicated that the airflow slowed down in the impulse pipe. The higher the MC of the added material is, the more obvious the decrease in the air velocities of the straight pipe and impulse pipe is; hence, the greater the MC is, the greater the moisture in the air is, and the more obvious the decrease in the air velocity is.

Different drying methods on MC uniformity
By comparing the results of ICD, oven-drying, and rotary drum-drying [34], 30 samples were randomly taken from the dried WF to investigate the final MC. The data were sorted by Excel (1) Straight pipe airflow velocity of 10 wt% wood fibers, (2) Impulse pipe airflow velocity of 10 wt% wood fibers, (3) Straight pipe airflow velocity of 30 wt% wood fibers, (4) Impulse pipe airflow velocity of 30 wt% wood fibers, (5) Straight pipe airflow velocity of 50 wt% wood fibers, (6) Impulse pipe airflow velocity of 50 wt% wood fibers, (7) No-load straight pipe airflow velocity, (8)  and subjected to analysis of variance (ANOVA) using SPSS 7.05 data processing system (IBM Inc., USA). The results are presented in Fig 11 and Table 2. According to the results, the MC of WF dried by ICD was more uniform than that dried by the other two drying methods. This indicated that, during the stable discharging stage, WFs were saturated in gas phase and had higher MC in air, which affected mass transfer efficiency to a certain extent. The final MC of WF could reach 1-3% final MC of WPCs.

Energy consumption cost comparison of different drying methods
The fiber with an initial MC of about 20% was mainly used to observe the energy consumption of different drying methods to select a more energy-saving and efficient fiber drying process. The temperature of ICD was 220˚C, inlet airflow was 11 m/s, feed rate was 120 kg/h. The energy consumption was under the best operating conditions determined in the experiment, and the electric heater was heated by two groups of heating tubes. Owing to the existence of automatic switch, according to the calculation, the electric loss was 70% of the actual electric quantity at 220˚C. Table 3 shows the optimum operating parameters according to the pilot plant established in the experiment. Taking drying 1000 kg wet fiber as an example, the energy consumption costs of the three methods were calculated, as shown in Table 4 (in which the electricity charge was 0.7 CNY/kWh according to the industrial electricity in China). It can be noticed that the energy consumption cost of ICD per 1000 kg of wet fiber was 329 CNY, which was less than that of oven-drying and rotary drum-drying, proving the superiority of ICD. The equipment was obtained on the basis of a pilot test. After engineering application in the future, the process intensification could be further improved, and the energy consumption cost could be further reduced.

MLRs
The results of the RSM experiment were analyzed by Design Expert 10.0.4, and the data of the RSM experiment were fitted with the multiple regression model analyzed by ANOVA. The results of ANOVA are shown in Table 5. The F-value of the model is 71.86, and the P-value is lower than 0.0001, indicating that the model is significant.
The effects of the initial MC (A) and inlet air temperature (B) on the final MC were extremely significant (P < 0.0001), that of feed rate (C) was generally significant, and that of inlet velocity (D) on deposition rate was not significant. By comparing the mean square values, it can be inferred that the order of process parameters affecting the final MC of WF is A > B > C > D. A 2 is a significant factor in the quadratic term, and the rest are not significant. The Pvalue of the lack of fit factors is 0.2762 (P > 0.05), revealing that the lacking fit factor was not Y ¼ 4:31 À 0:95A þ 0:18B þ 0:50C þ 2:14D À 0:60AB À 0:062AC À 0:52AD À 0:012BC À 0:050BD þ 0:21CD À 0:005A 2 þ 0:001B 2 À 0:009C 2 À 0:004D 2 ð15Þ To determine the best fitting degree of the selected model, the normal distribution map of the residual error is taken from the Design Expert software (Fig 12). According to the normal probability distribution map, the maximum final MC data point falls in the straight line of the normal distribution of the response data. This means that the response data set, i.e., the final MC, is normally distributed relative to the proposed linear model. By using the model graph option in the analysis module, the contour lines and response surface diagrams for evaluating the interaction strength of various experimental factors can be obtained. Thus, they can be used to predict the interaction of variables, so that the optimal drying process parameters can be determined. The interaction between various factors is shown in Fig 13. As shown in Fig 13(A), with a decrease in the initial MC (A) and an increase in the inlet air temperature (B), the final MC decreased significantly, i.e., the initial MC and inlet air temperature jointly affect the final MC. The bulge on the response surface is more intuitive in Fig 13  (A) than in Fig 13(B), which shows that the influence of inlet air temperature on the final MC is more significant. The reason is that the higher the inlet air temperature is, the larger the temperature gradient difference will be. Therefore, the speed of moisture migration inside the fiber to the surface will increase, further resulting in the lower final MC of the fiber. Fig 13(B) illustrates that, with a decrease in the initial MC and feed rate, the final MC decreases dramatically, since fewer materials are put into the pipeline simultaneously, and the MC of the airflow in the pipeline is less, which improves the MC gradient inside the fiber and on its surface. The

LSTM results
Taking the Design Expert's experimental design parameters and experimental results as training samples, the program was developed in the Spyder environment of Anaconda software. The initial weights and thresholds were assigned to the BP neural network and the LSTM for learning and updating, and the final MC prediction model was established. The iteration number of the initial network parameters was set to 1000, while the learning rate was set to 0.001. Besides, 30 groups of data in the sample were selected as training data, and 12 groups were used as test data to evaluate the neural network model. By adopting different neural network models to train the model on the training set and test on the test set, the performance of different neural network models under different parameter settings was examined. The regression analysis of the final MC was carried out on the test sets of RSM, the BP neural network model, and the LSTM. Fig 14 shows the comparative analysis on the experimental predicted values of the three models. The evaluation indexes of RSM, the BP model, and the LSTM model are obtained by calculation. The results are shown in Table 6.

Particle Swam Optimization (PSO) to optimize processing parameters
PSO is a computational model to search for the optimal solution by simulating the natural evolution process. The extremum optimization of PSO takes the predicted result of the trained neural network as the individual fitness value and searches for the global optimal value and corresponding input value of the function through selection, crossover, and mutation  operations. The parameters set in the PSO were a number of iteration evolution of 100 times, a population size of 20, a crossover probability of 0.4, a mutation probability of 0.2, and an individual length of 1. The results are shown in Table 7. The optimal process parameters of the BP-PSO model were an initial MC of 10.6%, an inlet air temperature of 238˚C, a feed rate of 90 kg/h, and an inlet air velocity of 9 m/s. The optimal process parameters of the LSTM-PSO model were an initial MC of 10.3%, an inlet air temperature of 242˚C, a feed rate of 90 kg/h, and an inlet air velocity of 8 m/s. The optimal process parameters obtained by the three methods were compared.
In the optimization problem, response surface analysis can only focus on factors and levels on the drying process at the known factor level, thus the optimization results are not global. BP and LSTM, as intelligent algorithms, can perform global optimization with the combination of PSO, which is global and scientific, and the accuracy of the optimization results is positively related to the accuracy of the selected neural network.

Prediction of PSO-LSTM combined model with expanded sample size
The PSO algorithm first initializes a group of particles in the feasible solution space and employs three indicators-position, speed, and fitness-to represent the characteristics of each particle; the fitness value is used as the standard to measure the quality of particles. In this paper, the particle position corresponds to the initial MC, inlet temperature, feed rate, and inlet velocity, and the fitness value corresponds to the final MC. Particles move in the solution space and update individual positions by tracking individual and global optimal positions. Besides, the value is calculated every time the particle updates the position, and the individual and global optimal positions are updated by comparing the fitness value of the new particle with that of the individual and global optimal positions. The process of PSO is shown in Fig 15. A total of 400 groups of sample data were expanded, 304 of which were used for model training and the remaining 96 for prediction. PSO algorithm was used to optimize the number of hidden layer neurons and the learning rate of LSTM model; it was also used to judge whether to update the individual and global optimal solutions until the termination conditions were met to obtain the optimal parameters. In the training process, the optimized LSTM model was obtained by calculating the particle fitness function and updating the particle position simultaneously. The LSTM and PSO-LSTM prediction models are shown in Fig 16.

Effect of different drying methods on MC uniformity
Among the different drying methods, the ICD had the lowest variable coefficient of 0.0798 proving the best MC uniformity ( Table 2). The lightweight fibers in the ICD system were discharged from the system first, whereas the heavier fibers were repeatedly distributed and arranged in the system, and the drying time was longer, thus the MC of the fibers was more uniform. In addition, although the variable coefficient of the rotary drum-drying method (0.1043) was slightly higher than that of the ICD method (0.0798), the rotary drum-drying method took a longer time. Oven-drying, although a common method, resulted in the poor uniformity of the fiber MC (Fig 11). The lower MC could easily cause fracture of fibers when fiber and plastic were mixed, resulting in lower mechanical properties of WPCs. One possible reason is that the heat and mass transfer efficiency of different parts of the exposed accumulated fibers was different. In addition, the exposure of poplar fibers in hot air was less, and the bound water was difficult to discharge from the fibers. Therefore, Uniform MC could be obtained more easily by ICD method and drum-drying method, and the drying time of ICD was shorter than that of drum-drying.

Relationship between drying parameters and final MC
The temperature of airflow is the main external factor in determining the drying efficiency [35]. With an increase in the inlet air temperature, the drying rate of WFs increased. The higher the temperature was, the shorter the time to reach the required final MC was. Thus, the evaporation speed of moisture on the surface of WF increased with air temperature. Because of the increase in MC gradient and temperature gradient inside and outside WF, the diffusion speed of moisture in WF increased. Therefore, increasing the airflow temperature was conducive to increasing the drying speed of WFs. The higher the air temperature was, the higher the precipitation rate was, but the higher the energy consumption was. The temperature of WFs and the temperature of moisture in fibers increased with the increase in airflow temperature. WF drying is the process of using heat energy to remove water from fibers, thus the MC of WFs decreases continuously. According to the variance analysis (Table 5), the influence of the initial MC on the final MC of fibers was extremely significant. The initial MC played an important role in fiber drying process. The initial MC of wet fibers should be uniform in the drying process, otherwise, the process parameters were very difficult to control, which would lead to the unqualified final MC. Secondly, the low initial MC was better for drying, so that the drying time was short and the energy consumption was low. WFs used in WPCs come from a wide range, most of which are low-value wood, processing residues, and waste wood with a wide range of initial MCs. The final MC of WPCs should be in the range 1-3% to ensure sufficient fluidity in the composite process without many pores, so that the produced composites can meet the national standards of China [36]. When the MC was lower than the fiber saturation point, there was a noncrystalline region between water and cellulose. The macromolecular hydroxyl of cellulose was absorbed with water molecules in the form of hydrogen bonds, and more heat was needed to discharge water. If the final MC was too low, especially 0%, ICD system would easily catch fire.
The resistance to moisture removal when WFs are dried can be divided into internal resistance and external resistance [37]. The internal resistance is mainly related to the length-diameter ratio, MC, temperature, and other factors of the WFs. The external resistance is directly related to the airflow velocity affecting the mass exchange coefficient between the air and the WF surface. Airflow velocity was an important external factor affecting the drying of WFs. When the airflow velocity increased, the WFs had a short residence time in the equipment, and the MC showed an increasing trend. In contrast, when the inlet velocity increased from 11 to 13 m/s, the final MC showed a decreasing trend, indicating that the increased airflow velocity caused the boundary layer on the WF surface to be destroyed and caused the moisture on the WF surface to be quickly carried away as vapor. As heat and moisture transfer conditions were improved and the drying process was accelerated, properly increasing airflow rate was conducive to improving fiber drying efficiency.
Feed rate also has a certain influence on the final MC. As the feed rate increases continuously, the final MC shows an upward trend. When more fibers were invested simultaneously, the moisture in the pipe airflow was improved, which reduced the gradient of MC between the interior of the fiber and the surface, so that the rate of water migration was slower and the final MC was higher. Increasing the feed rate could reduce the drying efficiency because the unit energy consumption was unchanged, and the total energy consumption increased sharply [38]. Secondary drying or heating drying was required during the drying process. Therefore, if the feed rate was too high, the drying time would be too long, which could affect the next operation of ICD.

Analysis of the ability of PSO-LSTM to predict MC
LSTM can deal with the gradient problem of neural networks in computation in information processing, maintain better accuracy, and solve the problem that traditional neural networks can only deal with the prediction of linear sequence. Twelve groups of sample data were selected as test sets. As shown in Fig 14, the LSTM had good fitting accuracy; its determination coefficient, R 2 , is 0.9446, which is higher than that of RSM and BP (Table 6). Therefore, LSTM could be characterized as more stable. The LSTM model had the best prediction performance, followed by the BP method. Compared to the traditional BP, the LSTM could effectively learn the long-term dependence of process factors to achieve the ideal prediction effect. The relative error value of the LSTM-PSO algorithm under the optimal process conditions was lower than that of the response surface and BP-PSO (Table 7), indicating that the MC prediction model based on the LSTM-PSO was effective in optimizing the process parameters of the ICD.
The expansion of the sample data to 400 groups was continued (Table 8), and the neural network model was constructed again by training. The results showed that the Pearson correlation coefficient of 96 groups of sample data is higher than that of 12 groups of test set data. The higher the resulting determination coefficient indicated that the expansion of the sample data capacity, the optimization effect of the prediction model was improved, indicating that this model has excellent predictive potential.
To verify the accuracy of the proposed scheme, BP, LSTM, and PSO-LSTM models were used to predict the MC on the same data set. By comparing 96 groups of data for the test set, it can be found that the results simulated by PSO-LSTM were better than those of LSTM model (Fig 16), the MC prediction value was closer to the actual value, and root MSE, MAE, and MAPE were better than other models (Table 8). In addition, the higher the resulting determination coefficient indicated that the expansion of the sample data capacity, the optimization effect of the prediction model was improved, indicating that this model has excellent predictive potential.

Conclusion
The investigation into the energy consumption of ICD, oven-drying, and drum-drying of poplar fibers revealed that the ICD had lower energy consumption and lower costs. By comparing the MLR model, BP neural network model, and LSTM neural network model, the prediction model of MC in WFs dried by ICD could be built. The variance analysis of RSM showed that there was a highly significant relationship between the process factors of the regression model and the final MC of poplar fiber. The initial MC and inlet air temperature in the model were extremely significant (P <0.001), while the feed rate was generally significant (P<0.05). The MSE and MAPE of the LSTM model were smaller than those of MLR model and BP model. Under the PSO algorithm, the final MC of LSTM obtained by optimization was 0.96%, and the error was lower than 1.33% and 1.43% obtained by the RSM model and the BP model, respectively. The PSO-LSTM method was more suitable for the process optimization and the prediction of the MC of WF. The analysis method adopted in this study could provide a reference for optimizing the process of WF WPCs and lay a theoretical foundation for the application of ICD technology in the biomass composite industry.
In this study, only some process parameters were considered for the final MC. In future studies, several factors can be considered, such as the length-diameter ratio and wood species. In addition, the PSO-LSTM model was combined with other models to further verify the prediction accuracy of the model.